Tutorial

Basic Usage

x Function

x is a shorthand function to push task into crawler's task queue;

In order to gain the TypeScript typings (for intellisense / autocomplete) while using CommonJS imports, use the following approach:

const x = require("crawlx").default;

After execution of http request, you can use callback to handle response in task's option or use Promise interface with x directly. res property will be added on task object after request.

let task = {
  url: "http://books.toscrape.com/",
  callback({ res }) {
    console.log(res.statusCode); // 200
  }
};
x(task).then(task => {
  console.log(task.res.statusCode); // 200
});

Access crawler with x

x.crawler;
x.agent; // x.crawler.agent; got instance
x.agentOptions.headers["Cookie"] = "name=jack"; // x.crawler.agent.defaults.options

Task options

crawlx use got to make http requests.

task = {
  url: "https://www.example.com",

  // callback function after finishing http request
  callback(task, cralwer) {},

  // task with higher priority will be executed earlier
  priority: 0,

  // additional key-value pairs will be passed to agent's request options
  // full list: https://github.com/sindresorhus/got

  headers: {},
  method: "POST",
  params: {},
  body: {}
};

Extend instance

Create new instance with options.

const x = require("crawlx").default;

const crawlerOptions = { concurrency: 10 };
const x2 = x.create(crawlerOptions);
{
  concurrency: 1,

  // cheerio options
  cheerio: {
    keepRelativeUrl: false, // don't transform relative url to absolute
    disable: false // disable cheerio plugin on response
  },

  // got options for creating instance
  got: {}
};

You can access crawler instance with x.crawler and got instance with x.agent or x.crawler.agent.

x.agent.defaults.options.prefixUrl = "https://example.com";
x.agentOptions = "https://example.com";